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Abstract: We develop a generalized optimization framework for graph-based 
semi-supervised learning. The framework gives as particular cases the Standard 
Laplacian, Normalized Laplacian and PageRank based methods. We have also 
provided new probabilistic interpretation based on random walks and charac- 
terized the limiting behaviour of the methods. The random walk based inter- 
pretation allows us to explain differences between the performances of methods 
with different smoothing kernels. It appears that the PageRank based method 
is robust with respect to the choice of the regularization parameter and the 
labelled data. We illustrate our theoretical results with two realistic datasets, 
characterizing different challenges: Les Miserables characters social network and 
Wikipedia hyper-link graph. The graph-based semi-supervised learning classi- 
fies the Wikipedia articles with very good precision and perfect recall employing 
only the information about the hyper-text links. 
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Un cadre general d'optimisation pour les 
methodes d'apprentissage semi-supervisees sur 

graphes 

Resume : Dans ce rapport nous proposons un schema d'optimisation generique 

pour rapprentissagc scnii-supervise sur des graphes. Ce cadre intcgre comme cas 
particuhers les approches dites du Laplacien standard et du Laplacien normal- 
ise ainsi qu'une methode basee sur PageRank. Nous proposons egalement une 
interpretation probabiliste originale qui s'appuie sur la notion de marche alea- 
toire, puis nous etudions les comportements limites de ces methodes. Le recours 
aux marches aleatoires nous permet d'expliquer les differences de performances 
existant entrc ces trois noyaux de lissage. Une des conclusions principales de ce 
travail est que les methodes construites sur PageRank sont plus robustes face 
au choix du parametre de regularisation et des points marques. Nous illustrons 
nos resultats theoriques avec deux jeux de donnees reelles representatives de 
deux defis distincts: celui des reseaux sociaux avec le cas des personnages du 
roman "Les Miserables" et celui des graphes d'hyper- liens atravers I'application 
Wikipedia. En particulicr, nous demontrons qu'il est possible de classifier les 
articles de Wikipedia avec une tres bonne precision et un tres bon rappel, a 
partir de la seule information fournie par les liens hyper-texte. 

Mots-cles : Apprentissage Semi-supervise, PageRank, Marche Aleatoire sur 
des Graphes, Classification Automatique des Articles de Wikipedia 
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1 Introduction 

Semi-supervised classification is a special form of classification. Traditional clas- 
sifiers use only labeled data to train. Semi-supervised learning use large amount 
of unlabeled data, together with labeled data, to build better classifiers. Semi- 
supervised learning requires less human effort and gives high accuracy. Graph- 
based semi-supervised methods define a graph where the nodes are labeled and 
unlabeled instances in the dataset, and edges (may be weighted) refiect the 
similarity of instances. These methods usually assume label smoothness over 
the graph (see the excellent book on the graph-based semi-supervised learning 
In this work we often omit "graph-based" term as it is clear that we only 
consider graph-based semi-supervised learning methods. 

Up to the present, most literature on the graph-based semi-supervised learn- 
ing studied the following two methods: the Standard Laplacian based method 
(see e.g., [TS]) and the Normalized Laplacian based method (see e.g., P3])- Here 
we propose a generalized optimization framework which implies the above two 
methods as particular cases. Moreover, our generalized optimization framework 
gives PageRank based method as another particular case. The PageRank based 
methods have been proposed in [3] as a classification stage in a clustering method 
for large hyper-text document collections. In [3] only a linear algebraic formu- 
lation was proposed but not the optimization formulation. A great advantage 
of the PageRank based method is that it has a quasi-linear complexity. In [5] a 
method also based on PageRank has been proposed. However, the method of [5] 
cannot be scaled to large datasets as it is based on the K-means method. The 
generalized optimization framework allows us to provide intuitive interpretation 
of the differences between particular cases. Using the terminology of random 
walks on graphs we also provide new probabilistic interpretation for the Stan- 
dard Laplacian based method and the PageRank based method. With the help 
of the random walk terminology we are able to explain differences in classifi- 
cations provided by the Standard Laplacian based method and the PageRank 
based method. The generalized optimization framework has only two free pa- 
rameters to tune. By choosing the first parameter, we vary the level of credit 
that we give to nodes with large degree. By choosing the second parameter, 
the regularization parameter, we choose a trade-off between the closeness of the 
classification function to the labeling function and the smoothness of the clas- 
sification function over the graph. We study sensitivity of the methods with 
respect to the value of the regularization parameter. We conclude that only the 
PageRank based method shows robustness with respect to the choice of the value 
of the regularization parameter. We illustrate our theoretical results and obtain 
further insights from two datasets. The first dataset is a graph of co-appearance 
of the characters in the novel Les Miserables. The second data set is a collec- 
tion of articles from Wikipedia for which we have expert classification. We have 
compared the quality of classification of the graph-based semi-supervised learn- 
ing methods with the quality of classification based on Wikipedia categories. It 
is remarkable to observe that with just few labeled points and only using the 
hyper-text links, the graph-based semi-supervised methods perform nearly as 
good as Wikipedia categories in terms of precision and even better in terms of 
recall. With the help of the two datasets we confirm that the PageRank based 
method is more robust than the other two methods with respect to the value of 
the regularization parameter and with respect to the choice of labeled points. 
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The rest of the paper is organized as follows: In Section 2 we describe a gen- 
eraUzed optimization framework for the graph-based semi-supervised learning 
and discuss differences among particular cases. In Section 3 we demonstrate our 
theoretical results by numerical examples. We conclude the paper in Section 4 
with directions for future research. 



2 Generalized Optimization Framework 

The input to a semi-supervised classification consists of a set of data instances 
X = {Xi, .., Xp, Xp^i, .., Xn}. An instance could be described by a fixed 
collection of attributes. For example, all attributes can take real numbers as 
values and these numbers can be normalized. Suppose we have K classes and 
the first P instances in our dataset are labeled as k{i) G 1,...,K, i = 1,...,P. 
Let matrix W represent degrees of similarity between instances in X. The 
construction of W can be done by various method. If we continue with the 
example where attributes are given by normalized real numbers, the Radial 
Basis Function (RBF) 

Wij=eM-\\X^-Xj\\yj) 
or k-Nearest Neighbors (kNN) method 

J 1, if Xj is one of the k nearest neighbors of Xj, 
10, otherwise 

can be chosen to construct the similarity matrix. In the datasets of this article 
we assume that the matrix W is symmetric. The RBF method in fact gives 
a symmetric similarity matrix. In general, the kNN method can give non- 
symmetric matrix, but it could be easily transformed to the symmetric one by 
W = (W + W'^)/2. Denote by D a diagonal matrix with its (i, z)-element 
equals to the sum of the i-th row of matrix W: 

N 

i=i 

In some applications, which is also the case for our datasets, the similarity graph 
is available as a part of the data. 
Define N x K matrix Y as 




1, if Xi is labeled as k{i) = k, 
0, otherwise. 



We refer to each column Y.k of matrix Y as labeling function. Also define Nx K 

matrix F and call its columns F.k classification functions. A general idea of the 
graph-based semi-supervised learning is to find classification functions so that 
on the one hand they will be close to the corresponding labeling function and 
on the other hand they will change smoothly over the graph associated with 
the similarity matrix. This general idea can be expressed with the help of 
optimization formulation. In particular, there are two widely used optimization 
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frameworks. The first formulation, tire Standard Laplacian based formulation 
|15 j. is as follows: 



N N N 

F 



i=l j=l i=l 



and the second, the Normalized Laplacian based formulation [2], is as follows: 



N N N 

min{^ E II ^ - ^ II' + ^ E - II'} (2) 

where ^ is a regularization parameter. In fact, the parameter ^ represents 
a trade-off between the closeness of the classification function to the labeling 
function and its smoothness. 

Here we propose a generalized optimization framework, which has as par- 
ticular cases the two above mentioned formulations. Namely, we suggest the 
following optimization formulation 



TV Af N 



1=1 j=i 1=1 



In addition to the Standard Laplacian formulation (ct = 1) and the Normalized 
Laplacian formulation (cr = 1/2), we obtain the third very interesting case when 
(7 = 0. We show below that this particular case corresponds to PageRank based 
clustering [3 , for which dsl can be rewritten as: 



N N N 

--EE---ii^ - f^ii' +/^E f 11^^- - ^^-11' 

1=1 J = l 1=1 



Since the objective function of the generalized optimization framework is a 
sum of a positive semi-definite quadratic form and a positive quadratic form, 
we can state the following proposition. 

Proposition 1 The objective of the generalized optimization framework for 
semi-supervised learning is a convex function. 

One way to find F is to apply one of many efficient optimization methods 
for convex optimization. Another way to find F is to find it as a solution of 
the first order optimality condition. Fortunately, we can even find F in explicit 
form. 

Proposition 2 The classification functions for the generalized semi- supervised 
learning are given by 



F.k = -^{l- -^D-'^WD'^-' ) y,, (4) 
2 + /.t \ 2 + ^ 



-1 



fork = l,...,K. 



RR n° 7774 



Graph-based Semi-supervised Learning 



6 



Proof: The objective function of the generahzed semi-supervised learning 
framework can be rewritten in the following matrix form 

K 

Q{F) ^ 2^FID''-^ LD"-^ F,k 

k=l 

K 
k=l 

where L = D — W is the Standard Laplacian. The first order optimality condi- 
tion Df^Q{F) = gives 

2Fl{D''-^LD''-^ + D"-^ D"-^) 
+2ti{F,k - YkfD^^-^ = 0. 

Multiplying the above expression from the right hand side by _D^^'^+^. we obtain 

2Flp"-i(L + L^)D-^) + 2fiiF,k - Ykf = 0. 

Then, substituting L = D — W and rearranging the terms yields 

Fl{21 - D^-^W + W'^)D-'' + nl) - liYl = 0. 

Since is a symmetric matrix, we obtain 

Fl{21 - 2D''-^WD-" + ^/) - iiYl = 0. 

Thus, we have 

Fl = tiYl{2I - 2D^-'WD-'' + tiir\ 

which proves the proposition. 

As a corollary, we have explicit expressions for the classification functions 
for the three mentioned above particular semi-supervised learning methods. 
Namely, from expression (HI) we derive 



• if (7 = 


1. the Standard Laplacian method: 


F.k = 




• if (T = 


1/2, the Normalized Laplacian method: 


F.k = 




• if (7 = 


0, PageRank based method: 


F.k = 


^^[I-^^WD-^)-^Y.k. 



Let us now explain why the case a — Q corresponds to the PageRank based 
clustering method. Denote a = 2/(2 + and write F.k in a transposed form 

Fl = {l-a)Yl{I-aD-^W)-\ 

If the labeling functions are normalized, this is exactly an explicit expression for 
PageRank |10l [3] . This expression was used in [3] but no optimization framework 
was provided. 



RR n° 7774 



Graph-based Semi-supervised Learning 



7 



Note that D~^W represents the transition probabihty matrix for the ran- 
dom walk on the similarity graph. Then, the (i,j)-th element of the matrix 
(J — aD^^W)^^ gives the expected number of visits to node j starting from 
node i until the random walk restarts with probability I — a. This observation 
provides the following probabilistic interpretation for the Standard Laplacian 
and PageRank based methods. In the Standard Laplacian method, Fik gives up 
to a multiplicative constant the expected number of visits before restart to the 
labeled nodes of class k if the random walk starts from node i. In the PageRank 
based method with normalized labeling functions, F^k gives up to a multiplica- 
tive constant the expected number of visits to node i, if the random walk starts 
from a uniform distribution over the labeled nodes of class k. 

The random walk approach can explain why in some cases Standard Lapla- 
cian and PageRank based methods provide different classifications. For instance, 
consider a case when a node v is directly connected to the labeled nodes fci and 
k2 belonging to different classes. Furthermore, let the labeled node fci have a 
higher degree than the node /c2 and let the node ki belong to a denser cluster 
than node /c2- From [4 we know that the expected number of visits to node j 
starting from node i until the restart is equal to the product of the probability to 
visit node j before the absorption and the expected number of returns to node j 
starting from node j. Then, the PageRank based method will classify the node 
V into the class of the labeled node ^2 as it is more likely that the random walk 
misses the node v starting from node fci. In other words, when the random 
walk starts from fc2, there are less options how to choose a next node and it is 
more likely to choose node w as a next node. In the Standard Laplacian method 
we need to compare the average number of visits to the labeled nodes starting 
from the node v. Since the random walk can reach either node fci or node fc2 
in one step the probabilities of hitting these nodes before absorption are similar 
and what matters is how dense are the classes. If the class associated with the 
labeled node fci is more dense than the class associated with the labeled node fc2 , 
the node v will be classified to the class associated with fci. We shall illustrate 
the above reasoning by a specific example in the next section. 

Based on the formulation (|3|, we could give some further intuitive inter- 
pretation for various cases of the generalized semi-supervised learning. Let us 
consider the first term in the r.h.s. sum of (|3|, which corresponds to the smooth- 



ness component. Figure 1(a) shows that if ct < 1 we do not give much credit 
to the connections between points with large degrees. Let us now consider the 
second term which corresponds to the fitting function. Figure 1(b) [ shows that 



a < 1/2 does not give much credit to samples that pertain to a dense cluster 
of points (i.e. da is large), whereas samples that are relatively isolated in the 
feature space (corresponding to small value of du), are given higher confidence. 
If (T = 1, the node degree does not have any infiuence. And if cr > 1/2, we 
consider that the nodes with higher weighted degree are more important than 
the nodes with smaller degree. 

Next we analyze the limiting behavior of the semi-supervised learning meth- 
ods when n —>■ (a 1). We shall use the following Laurent series expansion 

(l - a)[I ^ aD-^W]-^ ^ a[ln + {1 - a)H + o{l - a)], (5) 

where tt is the stationary distribution of the random walk (ttD^^W = tt), 1 is a 
vector of ones of appropriate dimension and H = {I — D^^W -\-1tt)^^ — In is the 
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(a) Smoothness term 
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(b) Fitting term 

Figure 1: Fitting and smoothness terms 

deviation matrix [12J. Let us note that if the similarity matrix W is symmetric, 
the random walk governed by the transition matrix D^^W is time-reversible 
and its stationary distribution is given by 

TT = {l^Dir'l^D. (6) 

Let us insert the Laurent series expansion ([s]) into the expression for the 
general classification function Q: 

F,k = (1 - a){I - aD--WD-^)-^Y,k 

= (1 - a)[D-''+^{I - aD-^W)D"-^]-^Yk 

= (1 - a)D-''+^[I - aD-^W]-^D''-^Yk 

= aD-''+^ [iTT + (1 - + o(l - a)]D''-^Y,k (7) 

i:k{i)—k 

+ (1 - a)D-"+^HD''-^Y,k + o(l - a)]. 
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Next, using the expression for the stationary distribution (|6|, we can specify ([7| 
as follows: 

F.k = a[D-^+Ha^Dl)-' Y,,df 

i:k{i)=k (8) 

+ (1 - a)D-''+^HD''-^Y,k + o(l - a)]. 

Hence, in the Semi-supervised methods when a is sufficiently close to one, a class 
with the largest 'Yl,i-k(i)=k ^ikdl attracts all instances. This implies that the lim- 
iting behavior of the PageRank based method (a = 0) is quite different from the 
limiting behaviors of the other methods. In particular, if the number of labelled 
points in each class is same or if the labeling functions are normalized, then there 
is no dominating class which attracts all instances and the classification results 
most likely make sense even for a very close to one. The conclusion is that 
the PageRank based method is more robust to the choice of the regularization 
parameter than the other graph-based semi-supervised learning methods. 
Illustrating example: to illustrate the limiting behaviour of the methods we 
generated an artificial example of the planted partition random graph model [B] 
with two classes with 100 nodes in each class. The probability of link creation 
inside the first class is 0.3 and the probability of link creation inside the second 
class is 0.1. So the first class is three times denser than the second class. The 
probability of link creation between two classes is 0.05. We have generated a 
sample of this random graph model. In each class we have chosen just one 
labelled point. In the first class we have chosen as the labelled point the point 
with the smallest degree (degree=28, 24 edges inside the class and 4 edges 
leading outside). In the second class we have chosen as the labelled point the 
point with the largest degree (degree=31, 27 edges inside the class and 4 edges 
leading outside). We have indeed observed that the second class attracts all 
points when a is close to one for all semi-supervised methods except for the 
PageRank based method. This is in accordance with theoretical conclusions as 
the labelled point in the second class has a larger weight than the labelled point 
in the first class. It is interesting to observe that in this example the first class 
looses all points when a is close to one even though the first class is denser then 
the second one. 
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3 Experiments 

In this section we apply the developed theory to two datasets. The first dataset 
is the network of interactions between major characters in the novel Les Mis- 
erables. If two characters participate in one or more scenes, there is a link 
between these two characters. The second dataset is a subset of Wikipedia 
pages. Wikipedia articles correspond to the data points and hyper-text links 
correspond to the edges of the similarity graph. We disregard the direction of 
the hyper-text links. 



3.1 Les Miserables example 

The graph of the interactions of Les Miserables characters has been compiled 
by Knuth There are 77 nodes and 508 edges in the graph. Using the 
betweenness based algorithm of Newman [TT] we obtain 6 clusters which can 
be identified with the main characters: Valjean (17), Myriel (10), Gavroche 
(18), Cosette (10), Thenardier (12), Fantine (10), where in brackets we give 
the number of nodes in the respective cluster. We have generated randomly 



100 times labeled points (one labeled point per cluster). In Figure 2(a) we plot 
the modularity measure averaged over 100 experiments as a function of a for 
methods with different values of a ranging from to 1 with granularity 0.1. The 
modularity measure is based on the inter-cluster link density and the average 
link density and reflects the quality of clustering [llj. From Figure 2(a) we 
conclude that on average the PageRank based method performs best in terms 
of modularity and it is robust with respect to the choice of the regularization 
parameter. In particular, we observe that as was predicted by the theory the 
Standard Laplacian method and Normalized Laplacian method perform badly 
when a is close to 1 (one class attracts all instances). The PageRank based 
method is robust even for the values of a which are very close to one. 

Next let us use the random walk based interpretation to explain differ- 
ences between the Standard Laplacian based method and the PageRank based 
method. Let us consider the node Woman 2 (see Figure |2(b)[ ). The node 
Woman 2 is connected with three other nodes: Valjean, Cosette and Javert. 
Suppose we have chosen labeled points so that only the nodes Valjean and 
Cosette are labeled but not Javert. Since the node Valjean has many more 
links than the node Cosette, the random walk starting from the node Valjean 
will less likely hit the node Woman 2 than the random walk starting from the 
node Cosette in some given time. Thus, the PageRank based method classifies 
the node Woman 2 into the class corresponding to Cosette. Since the node 
Woman 2 is just one link away from both Valjean and Cosette, the probability 
to hit these nodes before absorption is approximately equal. Thus, if we ap- 
ply the Standard Laplacian method the classification will be determined by the 
expected number of returns to the labeled nodes before absorption. Since the 
labeled node Valjean lies in the larger and denser class, the Standard Laplacian 
method classifies the node Woman 2 into the class corresponding to Valjean. 



3.2 Wikipedia-math example 

The second dataset is derived from the English language Wikipedia. In this 
case, the similarity graph is constructed by a slight modification of the hyper- 
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(a) Modularity as a function of a. 
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(b) Difference in classifications. 



Figure 2: Les Miserables example. 



text graph. Each Wikipedia article typically contains links to other Wikipedia 
articles which are used to explain specific terms and concepts. Thus, Wikipedia 
forms a graph whose nodes represent articles and whose edges represent hyper- 
text inter-article links. For our experiments we took a snapshot (dump) of 
Wikipedia from January 30, 201[Q Based on this dump we have extracted 
outgoing links for other articles. The links to special pages (categories, por- 
tals, etc.) have been ignored. In the present experiment we did not use the 
information about the direction of links, so the graph in our experiments is 
undirected. Then we have built a subgraph with mathematics related articles, 
a list of which was obtained from "List of mathematics articles" page from the 
same dump. In the present experiments we have chosen the following three 
mathematical topics: "Discrete mathematics" (DM), "Mathematical analysis" 
(MA), "Applied mathematics" (AM). With the help of AMS MSG Classification 
[j and experts we have classified related Wikipedia mathematical articles into 

^http : //download .wikimedia. org/enwiki/20100 130 
■^http : //www. ams . org/mathscinet/msc/msc2010 . html 
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the three above mentioned topics. According to the expert annotation we have 
built a subgraph of the Wikipedia mathematical articles providing imbalanced 
classes DM (106), MA (368) and AM (435). The subgraph induced by these 
three topics is connected and contains 909 articles. Then, the similarity matrix 
W is just the adjacency matrix of this subgraph. Thus, Wij = 1 means that 
Wikipedia article i is connected with Wikipedia article j. Then, we have chosen 



uniformly at random 100 times 5 labeled nodes for each class. In Figure 3(a) 



we plot the modularity averaged over 100 experiments as a function of a for 
methods with different values of a ranging from to 1 with granularity 0.1. 
Figure 3 (a) [ confirms the observations obtained from Les Miserable dataset that 
the PageRank based method (u = 0) has the best performance in terms of the 
modularity measure. Next, in Figure [3(b)| we plot the precision as a function 
of the regularization parameter for each of the three methods with respect to 
the expert classification. For the most values of a the PageRank based method 
performs better than all the other methods and shows robust behaviour when 
the regularization parameter approaches one. This is in agreement with the the- 
oretical conclusions at the end of Section 2. Both Figure 3(a) and Figure [3(b) | 
demonstrate that the PageRank based method is also more robust than the 
other two methods with respect to the choice of labeled points. 

Figure 3(a) and Figure [3(b)[ also suggest that we can use the modularity 
measure as a good criterion for the choice of the regularization parameter for 
the Standard Laplacian and Normalized Laplacian methods. Now, let us in- 
vestigate the effect of the quantity of the labelled data on the quality of the 
classification. Figures 4(a)[[4(b) and 4(c)[ show that on average the modularity 
of the classification increases when we increase the quantity of the labelled data. 
Moreover, the quality of classification improves significantly when we increase 
the quantity of labelled data for each class from few points to about 50 points. 
The further increase of the quantity of the labelled data does not result in sig- 
nificant improvement in classification quality. The same behaviour manifests 
itself with respect to the precision measure (see Figures [5(a)][5(b)] and [5(c)]). 

Both Les Miserables and Wikipedia-math datasets indicate that for the 
PageRank based method it is better to choose the value of the regularization 
parameter as close to one as possible but at the same time keeping the system 
numerically stable and efficient. This is an example of the singular perturbation 
phenomena pi [TB] 

We have also compared the results obtained by the semi-supervised learning 
methods with the classification provided by Wikipedia Categories. As Wikipedia 
categories we have chosen: Applied_mathematics , Mathematical_cinalysis 
and Discrete_mathematics. It turns out that the precision of the Wikipedia 
categories with respect to the expert classification is 78% (with 5 random la- 
belled points the PageRank based method can achieve about 68%). However, 
the recall of the Wikipedia categorization is 72%. With the help of the semi- 
supervised learning approach we have classified all articles. It is quite interesting 
to observe that just using the link information the semi-supervised learning can 
achieve precision nearly as good as the Wikipedia categorization produced by 
hard work of many experts and the semi-supervised learning can do even better 
in terms of recall. 
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(a) Modularity as a function of a. 
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(b) Precision as a function of a. 



Figure 3: Wikipedia-math example 

4 Conclusion and future research 

We have developed a generalized optimization approach for the graph-based 
semi-supervised learning which implies as particular cases the Standard Lapla- 
cian, Normalized Laplacian and PageRank based methods and provides the new 
ones based on parameter a. We have provided new probabilistic interpretation 
based on random walks. This interpretation allows us to explain differences 
in the performances of the methods. We have also characterized the limiting 
behaviour of the methods as a — > 1 which based on the weight of the labelled 
points. We have illustrated theoretical results with the help of Les Miserables 
example and Wikipedia-math example. Also, we show how the number of la- 
beled points has an influence on the quality of the classiflcation. Both theo- 
retical and experimental results demonstrate that the PageRank based method 
outperforms the other methods in terms of clustering modularity and robust- 
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ness with respect to the choice of labelled points and regularization parameter. 
We propose to use the modularity measure for the choice of the regularization 
parameter in the cases of the Standard Laplacian method and the Normalized 
Laplacian method. In the case of the Pagerank based method we suggest to 
choose the value of the regularization parameter as close to one as possible but 
at the same time keeping the system numerically stable and efficient. It appears 
that remarkably we can classify the Wikipedia articles with very good precision 
and perfect recall employing only the information about the hyper-text links. 
As future research we plan to apply the cross-validation technique to the choice 
of the kernel and to apply our approach to inductive semi-supervised learning 
[TJ|7], which will help us to work with out-of-sample data. 
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(a) Modularity as a function of a for PageRank. 
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(b) Modularity as a function of a for Normalized Laplacian. 
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Figure 4: Wikipedia-math example 
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(b) Precision as a function of a for Normalized Laplacian. 
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(c) Precision as a function of a for Standard Laplacian. 



Figure 5: Wikipedia-math example 
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